Principal Component Analysis (PCA) is a method used in unsupervised analysis to reduce the dimension of large data sets and is a useful tool to explore sample clusters arising based on the expression profile. The data points, that represent the samples, are projected onto the 2D plane such that they spread out in the two directions that explain most of the differences. Before plotting variance stabilizing transformations were performed. The following figures show the PCA obtained using the variance stabilizing transformation value of all genes (Figure 1). Under ideal conditions, the samples between groups should be dispersed and the samples within groups should be gathered together.
Figure 1: Principal Component Analysis (PCA) plot for all samples. The PCA was performed on all samples using the DESeq2 transformed counts (using variance stabilizing transformation) of all genes. Dots represent individual samples (identification of subject is shown with results when cursor is at the dot).
Figure 1A: Principal Component Analysis (PCA) plot for all samples. The PCA was performed on all samples using the DESeq2 transformed counts (using variance stabilizing transformation) of all genes. Dots represent individual samples (identification of subject is shown with results when cursor is at the dot).
Principal components (PC1, PC2, etc.) significant associations with Sample Covariates are listed in Table 2. They will be used, as a design model for analysis, since of theirs significant association with PC.
| Principal component | Covariate | p-value |
|---|---|---|
| PC2 | Bio_rep | 0.0058203 |
| PC2 | passage | 0.0455630 |
| PC2 | Pair | 0.0058203 |
The differential gene expression analysis was performed with DESeq2 workflow (Love et al.,2014).
The un-normalized, “raw” counts in a form of a matrix of integer values received form sequencing facility were used.
The first step was filtering of lowly expressed genes. As a threshold value Number of samples * 5 was used. The UMI read counts across all samples that were less than or equal to threshold value were filtered out and genes that pass this filter proceed further with the differential expression analysis.
The listed in Table 3 sample groups comparisons were performed.
| No | Sample Group | Control Group | Group Comparisons |
|---|---|---|---|
| 1 | 50uM ZED1227 | mock | 50uM ZED1227 VS mock |
| 2 | 100 U/mL hIFNg | mock | 100 U/mL hIFNg VS mock |
| 3 | 100 U/mL hIFNg + 50uM ZED1227 | mock | 100 U/mL hIFNg + 50uM ZED1227 VS mock |
| 4 | 100 U/mL hIFNg + 50uM ZED1227 | 100 U/mL hIFNg | 100 U/mL hIFNg + 50uM ZED1227 VS 100 U/mL hIFNg |
The results of differential regulation analysis summary are shown in Table 4. Number of up and down-regulated genes for each comparison above is defined as the Benjamini-Hochberg adjusted P-values (padj) ≤ 0.05 and log2 fold change threshold (log2FC) of (≥ |0.5|) (1.5-fold up/down-regulated).
| No | Group Comparisons | # DEG | # downregulated | # upregulated | |
|---|---|---|---|---|---|
| 50uM ZED1227 VS mock | 1 | 50uM ZED1227 VS mock | 10 | 2 | 8 |
| 100 U/mL hIFNg VS mock | 2 | 100 U/mL hIFNg VS mock | 881 | 281 | 600 |
| 100 U/mL hIFNg + 50uM ZED1227 VS mock | 3 | 100 U/mL hIFNg + 50uM ZED1227 VS mock | 704 | 198 | 506 |
| 100 U/mL hIFNg + 50uM ZED1227 VS 100 U/mL hIFNg | 4 | 100 U/mL hIFNg + 50uM ZED1227 VS 100 U/mL hIFNg | 11 | 4 | 7 |
The Volcano plot provides a way to perform a quick visual identification of the gene displaying large-magnitude changes which are also statistically significant. The plot is constructed by plotting -log10(padj) on the y-axis, and the log2 of the expression fold change between the two experimental groups on the x-axis. There are two regions of interest in the plot: those points that are found towards the top of the plot (high statistical significance) and at the extreme left or right (strongly down and up-regulated respectively).
For each differential gene expression analysis, all significantly deferentially expressed genes are listed. Significant changes are defined as padj ≤ 0.05 and log2FC threshold of (≥ |0.5|).
Full tables are available in Tables folder or using the link: Tables folder .
A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. All identified DEGs, organized as a heatmap, presented in Figure 3.
Figure 3: Differential gene transcriptome for 50uM ZED1227 VS mock comparison.
Figure 3: Differential gene transcriptome for 100 U/mL hIFNg VS mock comparison.
Figure 3: Differential gene transcriptome for 100 U/mL hIFNg + 50uM ZED1227 VS mock comparison.
Figure 3: Differential gene transcriptome for 100 U/mL hIFNg + 50uM ZED1227 VS 100 U/mL hIFNg comparison.
To identify common significantly differentially regulated genes that occur between the different the group comparisons, Upset plot (Figure 4A) and Venn diagram (Figure 4B) are used. On Figure 4A the numbers on top of the barplot shows the number of unique genes common to the comparisons specified below using black circles.
Figure 4A: Visualization of the intersection of significantly differentially regulated genes from all comparisons using an UpSet plot. The numbers on top of the barplot indicate the number of unique genes common to the comparisons specified below using black circles.
## [[1]]
##
## [[2]]
Figure 4B: Venn diagrams of all differentially expressed genes.
To check similarity between individual samples, the pairwise correlation was performed. With selected DEG genes from table 6, correlation coefficient for those genes for each pair was obtained and organized in a form of heatmap (Figure 5).
*** Figure 5. The pairwise
correlation heatmap visualizes cross-correlations between the
transcriptomic profiles of the samples.
GO is the abbreviation of Gene Ontology, which is a major bioinformatics classification system to unify the presentation of gene properties across all species. It includes three main branches: cellular component, molecular function, and biological process. GO terms with padj < 0.05 are usually considered to be significant enrichment.
Full tables with all detected significant GO terms and annotated genes are available in Tables folder or can be downloaded using the link: Tables folder.
Figure 6: Gene Ontology terms Barplot of enriched terms in an identified set of genes showing altered expression in the 50uM ZED1227 VS mock groups. TOP 20 are shown.
Figure 6: Gene Ontology terms Barplot of enriched terms in an identified set of genes showing altered expression in the 100 U/mL hIFNg VS mock groups. TOP 20 are shown.
Figure 6: Gene Ontology terms Barplot of enriched terms in an identified set of genes showing altered expression in the 100 U/mL hIFNg + 50uM ZED1227 VS mock groups. TOP 20 are shown.
Figure 6: Gene Ontology terms Barplot of enriched terms in an identified set of genes showing altered expression in the 100 U/mL hIFNg + 50uM ZED1227 VS 100 U/mL hIFNg groups. TOP 20 are shown.
The Reactome is a database of reactions, pathways, and biological processes, which can be used to browse pathways and submit data to a suite of data analysis tools, containing curated annotations that cover a diverse set of topics in molecular and cellular biology. Reactome terms with padj < 0.05 are considered to be significant enrichment.
Tables with significant Reactome terms and annotated genes are available in Tables folder or can be downloaded using the link: Tables folder
Figure 7: Reactome terms Barplot of enriched terms in an identified set of genes showing altered expression in the 50uM ZED1227 VS mock groups. TOP 20 are shown.
Figure 7: Reactome terms Barplot of enriched terms in an identified set of genes showing altered expression in the 100 U/mL hIFNg VS mock groups. TOP 20 are shown.
Figure 7: Reactome terms Barplot of enriched terms in an identified set of genes showing altered expression in the 100 U/mL hIFNg + 50uM ZED1227 VS mock groups. TOP 20 are shown.
Figure 7: Reactome terms Barplot of enriched terms in an identified set of genes showing altered expression in the 100 U/mL hIFNg + 50uM ZED1227 VS 100 U/mL hIFNg groups. TOP 20 are shown.
*** Figure 8:
Transglutaminase 2 (TGM2) mRNA expression.
To assess how reduced by drug TGM2 expression affects the pathways it is involved in we applied Gene Set Z-score (GSZ) calculation approach (Törönen et al., 2009) .
GSZ is defined as the difference between the error-weighted mean of the expression values of the genes in each pathway and the error-weighted mean of all genes in a sample after normalization. The number obtained shows how active a certain biological pathway or gene set in a specified sample.
To select gene sets where TGM2 is participating, from all enriched GO terms identified in step 4, only those who contain TGM2 gene were selected. Table with significant GO terms is available in Tables folder or could be downloaded using the link: TGM2 GO terms. For subsequent analysis child terms, i.e. those ones that are closer to the leaf nodes, were selected.
## Number of unique permutations: 20
## Number of unique permutations: 20Number of unique permutations: 10Number of unique permutations: 10
Figure 9. Sample GSZs for TGM2 involved GO terms.
To be able to compare group gene sets activity, the gene set analysis method described in Mishra et al., 2014, together with asymptotic P -values calculation was applied to our data set.
*** Figure 10. GSZ group
comparisons for TGM2 involved GO terms.
## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: Europe/Helsinki
## tzcode source: internal
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] zousan_0.0.0.170825 mGSZ_1.0
## [3] limma_3.60.6 ismev_1.42
## [5] mgcv_1.9-1 nlme_3.1-167
## [7] MASS_7.3-65 GSA_1.03.3
## [9] numform_0.7.0 reshape2_1.4.4
## [11] ReactomePA_1.48.0 org.Hs.eg.db_3.19.1
## [13] topGO_2.56.0 SparseM_1.84-2
## [15] graph_1.82.0 GOxploreR_1.2.7
## [17] ggraph_2.2.1 gontr_1.1.0
## [19] GO.db_3.19.1 AnnotationDbi_1.66.0
## [21] ggplotify_0.1.2 VennDiagram_1.7.3
## [23] futile.logger_1.4.3 UpSetR_1.4.0
## [25] ComplexHeatmap_2.20.0 circlize_0.4.16
## [27] htmltools_0.5.8.1 RCurl_1.98-1.17
## [29] formattable_0.2.1 ggrepel_0.9.6
## [31] writexl_1.5.2 rlist_0.4.6.2
## [33] biomaRt_2.60.1 kableExtra_1.4.0
## [35] DEGreport_1.40.1 htmlwidgets_1.6.4
## [37] lubridate_1.9.4 forcats_1.0.0
## [39] stringr_1.5.1 dplyr_1.1.4
## [41] purrr_1.0.4 readr_2.1.5
## [43] tidyr_1.3.1 tibble_3.2.1
## [45] tidyverse_2.0.0 plotly_4.10.4
## [47] scales_1.3.0 ggpubr_0.6.0
## [49] ggplot2_3.5.1 DESeq2_1.44.0
## [51] SummarizedExperiment_1.34.0 Biobase_2.64.0
## [53] MatrixGenerics_1.16.0 matrixStats_1.5.0
## [55] GenomicRanges_1.56.2 GenomeInfoDb_1.40.1
## [57] IRanges_2.38.1 S4Vectors_0.42.1
## [59] BiocGenerics_0.50.0 DT_0.33
## [61] readxl_1.4.5
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.5 bitops_1.0-9
## [3] enrichplot_1.24.4 httr_1.4.7
## [5] RColorBrewer_1.1-3 doParallel_1.0.17
## [7] tools_4.4.0 backports_1.5.0
## [9] R6_2.6.1 lazyeval_0.2.2
## [11] GetoptLong_1.0.5 withr_3.0.2
## [13] graphite_1.50.0 prettyunits_1.2.0
## [15] gridExtra_2.3 cli_3.6.4
## [17] formatR_1.14 logging_0.10-108
## [19] Cairo_1.6-2 scatterpie_0.2.4
## [21] network_1.19.0 labeling_0.4.3
## [23] sass_0.4.9 systemfonts_1.2.1
## [25] yulab.utils_0.2.0 gson_0.1.0
## [27] DOSE_3.30.5 svglite_2.1.3
## [29] R.utils_2.13.0 rstudioapi_0.17.1
## [31] RSQLite_2.3.9 generics_0.1.3
## [33] gridGraphics_0.5-1 shape_1.4.6.1
## [35] crosstalk_1.2.1 car_3.1-3
## [37] Matrix_1.7-3 abind_1.4-8
## [39] R.methodsS3_1.8.2 lifecycle_1.0.4
## [41] yaml_2.3.10 edgeR_4.2.2
## [43] carData_3.0-5 qvalue_2.36.0
## [45] SparseArray_1.4.8 BiocFileCache_2.12.0
## [47] blob_1.2.4 crayon_1.5.3
## [49] lattice_0.22-6 cowplot_1.1.3
## [51] annotate_1.82.0 KEGGREST_1.44.1
## [53] pillar_1.10.1 knitr_1.50
## [55] fgsea_1.30.0 rjson_0.2.23
## [57] codetools_0.2-20 fastmatch_1.1-6
## [59] glue_1.8.0 ggfun_0.1.8
## [61] data.table_1.17.0 treeio_1.28.0
## [63] vctrs_0.6.5 png_0.1-8
## [65] cellranger_1.1.0 gtable_0.3.6
## [67] cachem_1.1.0 xfun_0.51
## [69] S4Arrays_1.4.1 tidygraph_1.3.1
## [71] ConsensusClusterPlus_1.68.0 coda_0.19-4.1
## [73] iterators_1.0.14 statmod_1.5.0
## [75] ggtree_3.12.0 bit64_4.6.0-1
## [77] progress_1.2.3 filelock_1.0.3
## [79] bslib_0.9.0 colorspace_2.1-1
## [81] DBI_1.2.3 mnormt_2.1.1
## [83] tidyselect_1.2.1 bit_4.6.0
## [85] compiler_4.4.0 curl_6.2.1
## [87] httr2_1.1.1 xml2_1.3.8
## [89] ggdendro_0.2.0 DelayedArray_0.30.1
## [91] shadowtext_0.1.4 psych_2.5.3
## [93] rappdirs_0.3.3 digest_0.6.37
## [95] rmarkdown_2.29 XVector_0.44.0
## [97] pkgconfig_2.0.3 dbplyr_2.5.0
## [99] fastmap_1.2.0 rlang_1.1.5
## [101] GlobalOptions_0.1.2 UCSC.utils_1.0.0
## [103] farver_2.1.2 jquerylib_0.1.4
## [105] jsonlite_1.9.1 BiocParallel_1.38.0
## [107] statnet.common_4.11.0 GOSemSim_2.30.2
## [109] R.oo_1.27.0 magrittr_2.0.3
## [111] Formula_1.2-5 GenomeInfoDbData_1.2.12
## [113] patchwork_1.3.0 munsell_0.5.1
## [115] Rcpp_1.0.14 ape_5.8-1
## [117] viridis_0.6.5 stringi_1.8.4
## [119] zlibbioc_1.50.0 plyr_1.8.9
## [121] parallel_4.4.0 Biostrings_2.72.1
## [123] graphlayouts_1.2.2 splines_4.4.0
## [125] hms_1.1.3 locfit_1.5-9.12
## [127] igraph_2.1.4 ggsignif_0.6.4
## [129] futile.options_1.0.1 XML_3.99-0.18
## [131] evaluate_1.0.3 lambda.r_1.2.4
## [133] tzdb_0.5.0 foreach_1.5.2
## [135] tweenr_2.0.3 polyclip_1.10-7
## [137] reshape_0.8.9 clue_0.3-66
## [139] ggforce_0.4.2 broom_1.0.7
## [141] xtable_1.8-4 reactome.db_1.88.0
## [143] tidytree_0.4.6 rstatix_0.7.2
## [145] viridisLite_0.4.2 aplot_0.2.5
## [147] memoise_2.0.1 cluster_2.1.8.1
## [149] timechange_0.3.0